Search CORE

499 research outputs found

Testing the Mean Matrix in High-Dimensional Transposable Data

Author: Marioni John C.
Tavaré Simon
Touloumis Anestis
Publication venue: 'Wiley'
Publication date: 23/01/2015
Field of study

The structural information in high-dimensional transposable data allows us to write the data recorded for each subject in a matrix such that both the rows and the columns correspond to variables of interest. One important problem is to test the null hypothesis that the mean matrix has a particular structure without ignoring the potential dependence structure among and/or between the row and column variables. To address this, we develop a simple and computationally efficient nonparametric testing procedure to assess the hypothesis that, in each predefined subset of columns (rows), the column (row) mean vector remains constant. In simulation studies, the proposed testing procedure seems to have good performance and unlike traditional approaches, it is powerful without leading to inflated nominal sizes. Finally, we illustrate the use of the proposed methodology via two empirical examples from gene expression microarrays.Comment: in Biometrics, 201

arXiv.org e-Print Archive

University of Brighton Research Portal

CpG island composition differences are a source of gene expression noise indicative of promoter responsiveness.

Author: Marioni John C
Morgan Michael D
Publication venue: Genome Biol
Publication date: 01/06/2018
Field of study

BACKGROUND: Population phenotypic variation can arise from genetic differences between individuals, or from cellular heterogeneity in an isogenic group of cells or organisms. The emergence of gene expression differences between genetically identical cells is referred to as gene expression noise, the sources of which are not well understood. RESULTS: In this work, by studying gene expression noise between multiple cell lineages and mammalian species, we find consistent evidence of a role for CpG islands as sources of gene expression noise. Variation in noise among CpG island promoters can be partially attributed to differences in island size, in which short islands have noisier gene expression. Building on these findings, we investigate the potential for short CpG islands to act as fast response elements to environmental stimuli. Specifically, we find that these islands are enriched amongst primary response genes in SWI/SNF-independent stimuli, suggesting that expression noise is an indicator of promoter responsiveness. CONCLUSIONS: Thus, through the integration of single-cell RNA expression profiling, chromatin landscape and temporal gene expression dynamics, we have uncovered a role for short CpG island promoters as fast response elements

Directory of Open Access Journals

Apollo (Cambridge)

HDTD: analyzing multi-tissue gene expression data

Author: Marioni John C.
Tavaré Simon
Touloumis Anestis
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

This is the author accepted manuscript. It is currently under an indefinite embargo pending publication by Oxford University Press.Motivation: By collecting multiple samples per subject, researchers can characterise intra-subject variation using physiologically relevant measurements such as gene expression profiling. This can yield important insights into fundamental biological questions ranging from cell type identity to tumour development. For each subject, the data measurements can be written as a matrix with the different subsamples (e.g., multiple tissues) indexing the columns and the genes indexing the rows. In this context, neither the genes nor the tissues are expected to be independent and straightforward application of traditional statistical methods that ignore this two-way dependence might lead to erroneous conclusions. Herein, we present a suite of tools embedded within the R/Bioconductor package HDTD for robustly estimating and performing hypothesis tests about the mean relationship and the covariance structure within the rows and columns. We illustrate the utility of HDTD by applying it to analyze data generated by the Genotype-Tissue Expression consortium

University of Brighton Research Portal

Apollo (Cambridge)

HDTD: analyzing multi-tissue gene expression data.

Author: Marioni John C
Tavaré Simon
Touloumis Anestis
Publication venue: Bioinformatics
Publication date: 01/01/2016
Field of study

MOTIVATION: By collecting multiple samples per subject, researchers can characterize intra-subject variation using physiologically relevant measurements such as gene expression profiling. This can yield important insights into fundamental biological questions ranging from cell type identity to tumour development. For each subject, the data measurements can be written as a matrix with the different subsamples (e.g. multiple tissues) indexing the columns and the genes indexing the rows. In this context, neither the genes nor the tissues are expected to be independent and straightforward application of traditional statistical methods that ignore this two-way dependence might lead to erroneous conclusions. Herein, we present a suite of tools embedded within the R/Bioconductor package HDTD for robustly estimating and performing hypothesis tests about the mean relationship and the covariance structure within the rows and columns. We illustrate the utility of HDTD by applying it to analyze data generated by the Genotype-Tissue Expression consortium. AVAILABILITY AND IMPLEMENTATION: The R package HDTD is part of Bioconductor. The source code and a comprehensive user's guide are available at http://bioconductor.org/packages/release/bioc/html/HDTD.html CONTACT: : [email protected] SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.This is the author accepted manuscript. It is currently under an indefinite embargo pending publication by Oxford University Press

University of Brighton Research Portal

PubMed Central

Apollo (Cambridge)

Mosaic autosomal aneuploidies are detectable from single-cell RNAseq data.

Author: Griffiths Jonathan A
Marioni John C
Scialdone Antonio
Publication venue: BMC Genomics
Publication date: 01/11/2017
Field of study

BACKGROUND: Aneuploidies are copy number variants that affect entire chromosomes. They are seen commonly in cancer, embryonic stem cells, human embryos, and in various trisomic diseases. Aneuploidies frequently affect only a subset of cells in a sample; this is known as "mosaic" aneuploidy. A cell that harbours an aneuploidy exhibits disrupted gene expression patterns which can alter its behaviour. However, detection of aneuploidies using conventional single-cell DNA-sequencing protocols is slow and expensive. METHODS: We have developed a method that uses chromosome-wide expression imbalances to identify aneuploidies from single-cell RNA-seq data. The method provides quantitative aneuploidy calls, and is integrated into an R software package available on GitHub and as an Additional file of this manuscript. RESULTS: We validate our approach using data with known copy number, identifying the vast majority of aneuploidies with a low rate of false discovery. We show further support for the method's efficacy by exploiting allele-specific gene expression levels, and differential expression analyses. CONCLUSIONS: The method is quick and easy to apply, straightforward to interpret, and represents a substantial cost saving compared to single-cell genome sequencing techniques. However, the method is less well suited to data where gene expression is highly variable. The results obtained from the method can be used to investigate the consequences of aneuploidy itself, or to exclude aneuploidy-affected expression values from conventional scRNA-seq data analysis

Directory of Open Access Journals

Apollo (Cambridge)

Recommended from our members

Using single-cell genomics to understand developmental processes and cell fate decisions.

Author: Griffiths Jonathan A
Marioni John C
Scialdone Antonio
Publication venue: Molecular Systems Biology
Publication date: 16/04/2018
Field of study

High-throughput -omics techniques have revolutionised biology, allowing for thorough and unbiased characterisation of the molecular states of biological systems. However, cellular decision-making is inherently a unicellular process to which "bulk" -omics techniques are poorly suited, as they capture ensemble averages of cell states. Recently developed single-cell methods bridge this gap, allowing high-throughput molecular surveys of individual cells. In this review, we cover core concepts of analysis of single-cell gene expression data and highlight areas of developmental biology where single-cell techniques have made important contributions. These include understanding of cell-to-cell heterogeneity, the tracing of differentiation pathways, quantification of gene expression from specific alleles, and the future directions of cell lineage tracing and spatial gene expression analysis.J.A.G. was supported by Wellcome Trust Grant “Systematic Identification of Lineage Specification in Murine Gastrulation” (109081/Z/15/A). A.S. was supported by Wellcome Trust Grant “Tracing early mammalian lineage decisions by single cell genomics” (105031/B/14/Z). J.C.M. was supported by core funding from Cancer Research UK (award no. A17197) and EMBL

Apollo (Cambridge)

f-scLVM: scalable and versatile factor analysis for single-cell RNA-seq.

Author: Buettner Florian
Marioni John C
McCarthy Davis J
Pratanwanich Naruemon
Stegle Oliver
Publication venue: Genome Biol
Publication date: 01/01/2017
Field of study

Single-cell RNA-sequencing (scRNA-seq) allows studying heterogeneity in gene expression in large cell populations. Such heterogeneity can arise due to technical or biological factors, making decomposing sources of variation difficult. We here describe f-scLVM (factorial single-cell latent variable model), a method based on factor analysis that uses pathway annotations to guide the inference of interpretable factors underpinning the heterogeneity. Our model jointly estimates the relevance of individual factors, refines gene set annotations, and infers factors without annotation. In applications to multiple scRNA-seq datasets, we find that f-scLVM robustly decomposes scRNA-seq datasets into interpretable components, thereby facilitating the identification of novel subpopulations

Directory of Open Access Journals

PuSH

Apollo (Cambridge)

University of Melbourne Institutional Repository

FigShare

Global Biobank Meta-analysis Initiative: powering genetic discovery across human disease

Author: Campbell Archie
Hayward Caroline
Marioni Riccardo E
Porteous David John
Richmond Anne
Publication venue
Publication date: 12/10/2022
Field of study

Edinburgh Research Explorer

Correcting the Mean-Variance Dependency for Differential Variability Testing Using Single-Cell RNA Sequencing Data.

Author: Eling Nils
Marioni John C
Richard Arianne C
Richardson Sylvia
Vallejos Catalina A
Publication venue: Cell Syst
Publication date: 26/09/2018
Field of study

Cell-to-cell transcriptional variability in otherwise homogeneous cell populations plays an important role in tissue function and development. Single-cell RNA sequencing can characterize this variability in a transcriptome-wide manner. However, technical variation and the confounding between variability and mean expression estimates hinder meaningful comparison of expression variability between cell populations. To address this problem, we introduce an analysis approach that extends the BASiCS statistical framework to derive a residual measure of variability that is not confounded by mean expression. This includes a robust procedure for quantifying technical noise in experiments where technical spike-in molecules are not available. We illustrate how our method provides biological insight into the dynamics of cell-to-cell expression variability, highlighting a synchronization of biosynthetic machinery components in immune cells upon activation. In contrast to the uniform up-regulation of the biosynthetic machinery, CD4+ T cells show heterogeneous up-regulation of immune-related and lineage-defining genes during activation and differentiation.NE was funded by the European Molecular Biology Laboratory (EMBL) international PhD programme. ACR was funded by the MRC Skills Development Fellowship (MR/P014178/1). SR was funded by MRC grant MC_UP_0801/1. JCM was funded by core support of Cancer Research UK and EMBL. CAV was funded by The Alan Turing Institute, EPSRC grant EP/N510129/1

Edinburgh Research Explorer

Apollo (Cambridge)